Accelerating Kernel Density Estimation on the GPU Using the CUDA Framework
نویسندگان
چکیده
The main problem of the kernel density estimation methods is the huge computational requirements, especially for large data sets. One way for accelerating these methods is to use the parallel processing. Recent advances in parallel processing have focused on the use Graphics Processing Units (GPUs) using Compute Unified Device Architecture (CUDA) programming model. In this work we discuss a naive and two optimised CUDA algorithms for the two kernel estimation methods: univariate and multivariate. These optimised algorithms are based on the use of shared memory tiles and loop unrolling techniques. We also present exploratory experimental results of the proposed CUDA algorithms according to the several values of parameters such as number of threads per block, tile size, loop unroll level, number of variables and data (sample) size. The experimental results show significant performance gains of all proposed CUDA algorithms over serial CPU version and small performance speed-ups of the two optimised CUDA algorithms over naive GPU algorithms. Finally, based on extended performance results are obtained general conclusions of all proposed CUDA algorithms for some parameters. 1448 P. D. Michailidis and K. G. Margaritis Mathematics Subject Classification: 30C40, 65Y05, 68M20, 68W10
منابع مشابه
Accelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملAn approach to Improve Particle Swarm Optimization Algorithm Using CUDA
The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...
متن کاملNumerical Simulation of a Lead-Acid Battery Discharge Process using a Developed Framework on Graphic Processing Units
In the present work, a framework is developed for implementation of finite difference schemes on Graphic Processing Units (GPU). The framework is developed using the CUDA language and C++ template meta-programming techniques. The framework is also applicable for other numerical methods which can be represented similar to finite difference schemes such as finite volume methods on structured grid...
متن کاملParallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform
There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...
متن کاملHigh Performance Kernel Smoothing Library For Biomedical Imaging
of the Thesis High Performance Kernel Smoothing Library For Biomedical Imaging by Haofu Liao Master of Science in Electrical and Computer Engineering Northeastern University, May 2015 Dr. Deniz Erdogmus, Adviser The estimation of probability density and probability density derivatives has full potential for applications. In biomedical imaging, the estimation of the first and second derivatives ...
متن کامل